Introduction

This markdown imports the processed data and conducts the exploratory data analysis. Follow the processing markdowns in the processing_code folder to generate the processed data.


This analysis examines the predictors of allocation of FEMA funds during disaster declarations. In terms of the datasets, the two outcomes of interest are Requested Amount (ReqAmt) and Obligated Amount (OblAmt).


For each variable, the following steps will be completed:

  1. Produce and print some numerical output (e.g. table, summary statistics)
  2. Create a histogram or density plot (continuous variables only)
  3. Scatterplot, boxplot, or other similar plots against the main outcome of interest
  4. Any other exploration steps that may be useful.

The steps will be numbered for each variable as specified above.


Required Packages

The following R packages are required for this script:


Load Processed Data

Load the data generated from the markdowns in the processing_code folder.

#define file path
data_location <- here::here("data", "processed_data", "combinedprocessed.rds")

#load data
EDAdata <- readRDS(data_location)

Data Overview

To better understand the data, let’s use summarytools to better visualize the data.

summarytools::dfSummary(EDAdata)
## Data Frame Summary  
## EDAdata  
## Dimensions: 446 x 21  
## Duplicates: 0  
## 
## ----------------------------------------------------------------------------------------------------------------------------
## No   Variable           Stats / Values                      Freqs (% of Valid)    Graph                 Valid      Missing  
## ---- ------------------ ----------------------------------- --------------------- --------------------- ---------- ---------
## 1    state              1. LA                                22 ( 4.9%)                                 446        0        
##      [character]        2. TX                                21 ( 4.7%)                                 (100.0%)   (0.0%)   
##                         3. CA                                20 ( 4.5%)                                                     
##                         4. FL                                18 ( 4.0%)                                                     
##                         5. WV                                16 ( 3.6%)                                                     
##                         6. MS                                15 ( 3.4%)                                                     
##                         7. NC                                14 ( 3.1%)                                                     
##                         8. GA                                13 ( 2.9%)                                                     
##                         9. SC                                13 ( 2.9%)                                                     
##                         10. OK                               12 ( 2.7%)                                                     
##                         [ 41 others ]                       282 (63.2%)           IIIIIIIIIIII                              
## 
## 2    disasterNumber     Mean (sd) : 4100.9 (418.7)          445 distinct values   :           . :   .   446        0        
##      [integer]          min < med < max:                                          :           : : : :   (100.0%)   (0.0%)   
##                         3345 < 4237.5 < 4624                                      :           : : : :                       
##                         IQR (CV) : 859.2 (0.1)                                    : :       : : : : :                       
##                                                                                   : :       : : : : :                       
## 
## 3    IncidentYear       Mean (sd) : 2017 (2.9)              2012 :  43 ( 9.6%)    I                     446        0        
##      [numeric]          min < med < max:                    2013 :  41 ( 9.2%)    I                     (100.0%)   (0.0%)   
##                         2012 < 2018 < 2021                  2014 :  26 ( 5.8%)    I                                         
##                         IQR (CV) : 5 (0)                    2015 :  37 ( 8.3%)    I                                         
##                                                             2016 :  27 ( 6.1%)    I                                         
##                                                             2017 :  48 (10.8%)    II                                        
##                                                             2018 :  42 ( 9.4%)    I                                         
##                                                             2019 :  41 ( 9.2%)    I                                         
##                                                             2020 : 115 (25.8%)    IIIII                                     
##                                                             2021 :  26 ( 5.8%)    I                                         
## 
## 4    IncidentMonth      Mean (sd) : 5.6 (3.6)               12 distinct values    :                     446        0        
##      [numeric]          min < med < max:                                          :                     (100.0%)   (0.0%)   
##                         1 < 6 < 12                                                :                                         
##                         IQR (CV) : 7 (0.6)                                        :   .   .   . : :                         
##                                                                                   : : : : : . : : : :                       
## 
## 5    declarationType    1. DR                               330 (74.0%)           IIIIIIIIIIIIII        446        0        
##      [character]        2. EM                               116 (26.0%)           IIIII                 (100.0%)   (0.0%)   
## 
## 6    incidentType       1. Severe Storm(s)                  117 (26.2%)           IIIII                 446        0        
##      [character]        2. Hurricane                        105 (23.5%)           IIII                  (100.0%)   (0.0%)   
##                         3. Biological                        78 (17.5%)           III                                       
##                         4. Flood                             75 (16.8%)           III                                       
##                         5. Fire                              22 ( 4.9%)                                                     
##                         6. Severe Ice Storm                  12 ( 2.7%)                                                     
##                         7. Tornado                           11 ( 2.5%)                                                     
##                         8. Earthquake                         5 ( 1.1%)                                                     
##                         9. Snow                               5 ( 1.1%)                                                     
##                         10. Coastal Storm                     4 ( 0.9%)                                                     
##                         [ 6 others ]                         12 ( 2.7%)                                                     
## 
## 7    declarationTitle   1. COVID-19 PANDEMIC                 52 (11.7%)           II                    446        0        
##      [character]        2. SEVERE STORMS AND FLOODIN         38 ( 8.5%)           I                     (100.0%)   (0.0%)   
##                         3. SEVERE STORMS, TORNADOES,         33 ( 7.4%)           I                                         
##                         4. COVID-19                          26 ( 5.8%)           I                                         
##                         5. HURRICANE SANDY                   17 ( 3.8%)                                                     
##                         6. SEVERE STORMS, FLOODING,          16 ( 3.6%)                                                     
##                         7. SEVERE STORMS, TORNADOES,         14 ( 3.1%)                                                     
##                         8. SEVERE WINTER STORM               13 ( 2.9%)                                                     
##                         9. WILDFIRES                         12 ( 2.7%)                                                     
##                         10. HURRICANE IRMA                   10 ( 2.2%)                                                     
##                         [ 105 others ]                      215 (48.2%)           IIIIIIIII                                 
## 
## 8    IncidentDuration   Mean (sd) : 128.2 (246)             70 distinct values    :                     446        0        
##      [numeric]          min < med < max:                                          :                     (100.0%)   (0.0%)   
##                         0 < 12 < 660                                              :                                         
##                         IQR (CV) : 34.7 (1.9)                                     :                                         
##                                                                                   :                 :                       
## 
## 9    ResponseDays       Mean (sd) : 1332.2 (914.4)          327 distinct values     :                   446        0        
##      [numeric]          min < med < max:                                          . :                   (100.0%)   (0.0%)   
##                         70 < 1098 < 3588                                          : : . .                                   
##                         IQR (CV) : 1358.2 (0.7)                                   : : : : : .                               
##                                                                                   : : : : : : :                             
## 
## 10   IH                 Min  : 0                            0 : 267 (59.9%)       IIIIIIIIIII           446        0        
##      [numeric]          Mean : 0.4                          1 : 179 (40.1%)       IIIIIIII              (100.0%)   (0.0%)   
##                         Max  : 1                                                                                            
## 
## 11   IH_pct             Mean (sd) : 0.2 (1.4)               147 distinct values   :                     446        0        
##      [numeric]          min < med < max:                                          :                     (100.0%)   (0.0%)   
##                         0 < 0 < 27.7                                              :                                         
##                         IQR (CV) : 0.1 (5.5)                                      :                                         
##                                                                                   :                                         
## 
## 12   PA                 Min  : 0                            0 :  12 ( 2.7%)                             446        0        
##      [numeric]          Mean : 1                            1 : 434 (97.3%)       IIIIIIIIIIIIIIIIIII   (100.0%)   (0.0%)   
##                         Max  : 1                                                                                            
## 
## 13   PA_pct             Mean (sd) : 0.6 (1.3)               284 distinct values   :                     446        0        
##      [numeric]          min < med < max:                                          :                     (100.0%)   (0.0%)   
##                         0 < 0.3 < 27.3                                            :                                         
##                         IQR (CV) : 0.9 (2.4)                                      :                                         
##                                                                                   :                                         
## 
## 14   HM                 Min  : 0                            0 : 117 (26.2%)       IIIII                 446        0        
##      [numeric]          Mean : 0.7                          1 : 329 (73.8%)       IIIIIIIIIIIIII        (100.0%)   (0.0%)   
##                         Max  : 1                                                                                            
## 
## 15   HM_pct             Mean (sd) : 0.4 (1.3)               256 distinct values   :                     446        0        
##      [numeric]          min < med < max:                                          :                     (100.0%)   (0.0%)   
##                         0 < 0.2 < 27.2                                            :                                         
##                         IQR (CV) : 0.4 (3.5)                                      :                                         
##                                                                                   :                                         
## 
## 16   Population         1. 4,657,757                         22 ( 4.9%)                                 446        0        
##      [character]        2. 29,145,505                        21 ( 4.7%)                                 (100.0%)   (0.0%)   
##                         3. 39,538,223                        20 ( 4.5%)                                                     
##                         4. 21,538,187                        18 ( 4.0%)                                                     
##                         5. 1,793,716                         16 ( 3.6%)                                                     
##                         6. 2,961,279                         15 ( 3.4%)                                                     
##                         7. 10,439,388                        14 ( 3.1%)                                                     
##                         8. 10,711,908                        13 ( 2.9%)                                                     
##                         9. 5,118,425                         13 ( 2.9%)                                                     
##                         10. 3,959,353                        12 ( 2.7%)                                                     
##                         [ 41 others ]                       282 (63.2%)           IIIIIIIIIIII                              
## 
## 17   Counties           Mean (sd) : 74.8 (52.3)             46 distinct values        :                 446        0        
##      [integer]          min < med < max:                                              : :               (100.0%)   (0.0%)   
##                         1 < 67 < 254                                                  : :                                   
##                         IQR (CV) : 46.8 (0.7)                                     : : : :                                   
##                                                                                   : : : : .   .     .                       
## 
## 18   TotalAgencies      Mean (sd) : 4.9 (6.6)               31 distinct values    :                     446        0        
##      [numeric]          min < med < max:                                          :                     (100.0%)   (0.0%)   
##                         0 < 2 < 55                                                :                                         
##                         IQR (CV) : 6 (1.3)                                        :                                         
##                                                                                   : : . .                                   
## 
## 19   ReqAmt             Mean (sd) : 29673687 (218785540)    332 distinct values   :                     446        0        
##      [numeric]          min < med < max:                                          :                     (100.0%)   (0.0%)   
##                         -855200.3 < 109471.6 < 4388540486                         :                                         
##                         IQR (CV) : 1467763 (7.4)                                  :                                         
##                                                                                   :                                         
## 
## 20   OblAmt             Mean (sd) : 29744109 (219676788)    332 distinct values   :                     446        0        
##      [numeric]          min < med < max:                                          :                     (100.0%)   (0.0%)   
##                         -855200.3 < 109471.6 < 4408424297                         :                                         
##                         IQR (CV) : 1470450 (7.4)                                  :                                         
##                                                                                   :                                         
## 
## 21   FEMACSAvg          Mean (sd) : 0.8 (0.4)               137 distinct values                     :   446        0        
##      [numeric]          min < med < max:                                                            :   (100.0%)   (0.0%)   
##                         0 < 1 < 1                                                                   :                       
##                         IQR (CV) : 0.1 (0.4)                                                        :                       
##                                                                                   :             . . :                       
## ----------------------------------------------------------------------------------------------------------------------------

Looking at the output of summarytools::dfsummary()

Let’s dive a little deeper into each variable.


Outcome of Interest: Requested FEMA Funds

The first of two primary outcomes of interest is the total amount of funds the state requested from the federal government for a disaster declaration.

#summary statistics
base::summary(EDAdata$ReqAmt)
##       Min.    1st Qu.     Median       Mean    3rd Qu.       Max. 
##    -855200          0     109472   29673687    1467763 4388540486
#density plot
ggplot2::ggplot(data = EDAdata, aes(x = ReqAmt)) +
  geom_density()

#try a log transformation out of curiosity 
#this does ignore all negative values
ggplot2::ggplot(data = EDAdata, aes(x = log(ReqAmt))) +
  geom_density()
## Warning in log(ReqAmt): NaNs produced

## Warning in log(ReqAmt): NaNs produced
## Warning: Removed 120 rows containing non-finite values (stat_density).

#normal boxplot
ggplot2::ggplot(data = EDAdata, aes(y = ReqAmt)) +
  geom_boxplot()

#crop to $1M to get a better idea
ggplot2::ggplot(data = EDAdata, aes(y = ReqAmt)) +
  geom_boxplot() +
  ylim(0, 1000000)
## Warning: Removed 138 rows containing non-finite values (stat_boxplot).

#log transformed boxplot
ggplot2::ggplot(data = EDAdata, aes(y = log(ReqAmt))) +
  geom_boxplot()
## Warning in log(ReqAmt): NaNs produced
## Warning in log(ReqAmt): NaNs produced
## Warning: Removed 120 rows containing non-finite values (stat_boxplot).

##QQ plot
car::qqPlot(EDAdata$ReqAmt)

## [1] 328 273

These outputs show the requested amounts do not follow a normal distribution, which is to be expected given the broad range of disasters. We may need to remove outliers in the analysis, but for now, we’ll leave them to capture the true nature of disaster declarations. The largest outlier is likely the COVID-19 Pandemic. The log transformation does a better job of showing the distribution of the data, so we may need to consider an exponential model in the analysis; however, log(negative number) is undefined, so we would be ignoring all reimbursement data. The transformed and untransformed figures suggest there are significant tails in the data.


Outcome of Interest: Obligated FEMA Funds

The alternative outcome of interest is the funding obligated from FEMA. While this may be the same as ReqAmt (especially for large-scale disasters), FEMA doesn’t always pay the amount the states request.

#summary statistics
base::summary(EDAdata$OblAmt)
##       Min.    1st Qu.     Median       Mean    3rd Qu.       Max. 
##    -855200          0     109472   29744109    1470450 4408424297
#density plot
ggplot2::ggplot(data = EDAdata, aes(x = OblAmt)) +
  geom_density()

#try a log transformation out of curiosity 
ggplot2::ggplot(data = EDAdata, aes(x = log(OblAmt))) +
  geom_density()
## Warning in log(OblAmt): NaNs produced

## Warning in log(OblAmt): NaNs produced
## Warning: Removed 120 rows containing non-finite values (stat_density).

#normal boxplot
ggplot2::ggplot(data = EDAdata, aes(y = OblAmt)) +
  geom_boxplot()

#crop to $1M to get a better idea
ggplot2::ggplot(data = EDAdata, aes(y = OblAmt)) +
  geom_boxplot() +
  ylim(0, 1000000)
## Warning: Removed 139 rows containing non-finite values (stat_boxplot).

#log transformed boxplot
ggplot2::ggplot(data = EDAdata, aes(y = log(OblAmt))) +
  geom_boxplot()
## Warning in log(OblAmt): NaNs produced
## Warning in log(OblAmt): NaNs produced
## Warning: Removed 120 rows containing non-finite values (stat_boxplot).

##QQ plot
car::qqPlot(EDAdata$OblAmt)

## [1] 328 273

It appears that obligated funds and requested funds have similar distributions and characteristics. The same commentary about the requested funds analysis applies here as well.


Predictor: State

#frequency table since it's categorical
#hide NAs because there are no NAs
summarytools::freq(EDAdata$state, report.nas = FALSE)
## Frequencies  
## EDAdata$state  
## Type: Character  
## 
##               Freq        %   % Cum.
## ----------- ------ -------- --------
##          AK      8     1.79     1.79
##          AL     10     2.24     4.04
##          AR     10     2.24     6.28
##          AZ      3     0.67     6.95
##          CA     20     4.48    11.43
##          CO      6     1.35    12.78
##          CT      9     2.02    14.80
##          DC      3     0.67    15.47
##          DE      3     0.67    16.14
##          FL     18     4.04    20.18
##          GA     13     2.91    23.09
##          HI      7     1.57    24.66
##          IA     10     2.24    26.91
##          ID      6     1.35    28.25
##          IL      5     1.12    29.37
##          IN      3     0.67    30.04
##          KS      7     1.57    31.61
##          KY      7     1.57    33.18
##          LA     22     4.93    38.12
##          MA      6     1.35    39.46
##          ME      3     0.67    40.13
##          MI      7     1.57    41.70
##          MN     10     2.24    43.95
##          MO      8     1.79    45.74
##          MS     15     3.36    49.10
##          MT      2     0.45    49.55
##          NC     14     3.14    52.69
##          ND      6     1.35    54.04
##          NE      9     2.02    56.05
##          NH      7     1.57    57.62
##          NJ      5     1.12    58.74
##          NM      6     1.35    60.09
##          NV      3     0.67    60.76
##          NY     11     2.47    63.23
##          OH      7     1.57    64.80
##          OK     12     2.69    67.49
##          OR     10     2.24    69.73
##          PA      8     1.79    71.52
##          PR     11     2.47    73.99
##          RI      6     1.35    75.34
##          SC     13     2.91    78.25
##          SD     11     2.47    80.72
##          TN     11     2.47    83.18
##          TX     21     4.71    87.89
##          UT      1     0.22    88.12
##          VA      7     1.57    89.69
##          VT      6     1.35    91.03
##          WA     12     2.69    93.72
##          WI      9     2.02    95.74
##          WV     16     3.59    99.33
##          WY      3     0.67   100.00
##       Total    446   100.00   100.00
#examine graphical relationship with outcomes
#include a jitter function to have a better idea of number of measurements and distribution
#filter top 5 states
#start with obligated amount
obl_fig_state <- EDAdata %>%
                    dplyr::add_count(state) %>%
                    dplyr::filter(dplyr::dense_rank(-n) < 6) %>%
                    ggplot2::ggplot(aes(x = state, y = OblAmt)) +
                      geom_boxplot() +
                      geom_jitter(alpha = 0.5, width = 0.2, height = 0.2, color = "tomato") +
                      scale_y_continuous(labels = scales::dollar_format()) +
                      labs(x = "State or Territory",
                           y = "Obligated FEMA Funds",
                           title = "FEMA funds obligated to five most disaster-prone states")
obl_fig_state

#save file
obl_state_fig_file = here("results","obl-state.png")
ggsave(filename = obl_state_fig_file, plot = obl_fig_state)
## Saving 7 x 5 in image
#repeat for requested funds
req_fig_state <- EDAdata %>%
                    dplyr::add_count(state) %>%
                    dplyr::filter(dplyr::dense_rank(-n) < 6) %>%
                    ggplot2::ggplot(aes(x = state, y = ReqAmt)) +
                      geom_boxplot() +
                      geom_jitter(alpha = 0.5, width = 0.2, height = 0.2, color = "turquoise4") +
                      scale_y_continuous(labels = scales::dollar_format()) +
                      labs(x = "State or Territory",
                           y = "Requested FEMA Funds",
                           title = "FEMA funds requested by five most disaster-prone states")
req_fig_state

#save file
req_state_fig_file = here("results","req-state.png")
ggsave(filename = req_state_fig_file, plot = req_fig_state)
## Saving 7 x 5 in image
#create a table where columns represent requested and obligated funds and rows are 5 most disaster-prone states
decs_state_5 <- EDAdata %>%
                    dplyr::add_count(state) %>%
                    dplyr::filter(dense_rank(-n) < 6) %>%
                    gtsummary::select(state, ReqAmt, OblAmt) %>%
                    gtsummary::tbl_summary(
                      by = state,
                      label = list(state ~ "State or Territory",
                                   ReqAmt ~ "Requested Funds",
                                   OblAmt ~ "Obligated Funds"),
                      statistic = list(all_continuous() ~ "{mean} ({sd})")) %>%
                    gtsummary::modify_header(label ~ "**Variable**") %>% 
                    gtsummary::modify_spanning_header(c("stat_1", "stat_2", "stat_3", "stat_4", "stat_5") ~ "**State**") %>%
                    gtsummary::modify_caption("**Table 4. FEMA Funding for 5 Most Disaster-Prone States 2012 - 2021**") %>%
                    gtsummary::bold_labels()
decs_state_5
Table 4. FEMA Funding for 5 Most Disaster-Prone States 2012 - 2021
Variable State
CA, N = 201 FL, N = 181 LA, N = 221 TX, N = 211 WV, N = 161
Requested Funds 62,605,465 (169,900,730) 27,337,218 (68,075,813) 34,318,344 (90,594,708) 36,586,287 (119,143,196) 6,659,588 (25,471,101)
Obligated Funds 62,605,465 (169,900,730) 27,337,218 (68,075,813) 34,318,344 (90,594,708) 36,637,886 (119,167,108) 6,660,260 (25,470,955)

1 Mean (SD)

#save file
decs_state_file = here("results", "decs-state-5.Rds")
saveRDS(decs_state_5, file = decs_state_file)

Predictor: Incident Year

#summary statistics since it's continuous
base::summary(EDAdata$IncidentYear)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    2012    2015    2018    2017    2020    2021
#examine graphical relationship with outcomes

#density plot
ggplot2::ggplot(data = EDAdata, aes(x = IncidentYear)) +
  geom_density()

#normal boxplot
ggplot2::ggplot(data = EDAdata, aes(y = IncidentYear)) +
  geom_boxplot()

##QQ plot
car::qqPlot(EDAdata$IncidentYear)

## [1]  8 11
#histogram distribution
year_hist <- EDAdata %>% 
               ggplot2::ggplot(aes(x=IncidentYear)) + 
                        geom_bar(fill="steelblue4") + 
                        scale_x_continuous(breaks = seq(2012, 2021, by = 1)) +
                        scale_y_continuous(expand = c(0, 0), limits = c(0, 125)) +
                        geom_text(stat='count', aes(label=..count..), vjust=-1) +
                        labs(x = "Incident Year",
                             y = "Number of Declarations",
                             title = "Disaster Declarations per Year (2012 - 2021)")
year_hist

#save file
year_fig_file = here("results","incidentyear.png")
ggsave(filename = year_fig_file, plot = year_hist)
## Saving 7 x 5 in image
#examine graphical relationship with outcomes of interest
#look at total cost per incident year
req_fig_year <- EDAdata %>%
                    ggplot2::ggplot(aes(x = IncidentYear)) +
                      geom_bar(aes(y = ReqAmt),
                               stat = "identity",
                               fill = "tomato") +
                      scale_y_continuous(expand = c(0, 0), 
                                         labels = scales::dollar_format()) +
                      scale_x_continuous(breaks = seq(2012, 2021, by = 1)) +
                      labs(x = "Incident Year",
                           y = "Funds ($)",
                           subtitle = "Requested")
req_fig_year

#repeat for obligated funds
obl_fig_year <- EDAdata %>%
                    ggplot2::ggplot(aes(x = IncidentYear)) +
                      geom_bar(aes(y = OblAmt),
                               stat = "identity",
                               fill = "turquoise4") +
                      scale_y_continuous(expand = c(0, 0),
                                         labels = scales::dollar_format()) +
                      scale_x_continuous(breaks = seq(2012, 2021, by = 1)) +
                      labs(x = "Incident Year",
                           y = "Funds ($)",
                           subtitle = "Obligated")
obl_fig_year

#combine into one plot
plot_col <- cowplot::plot_grid(req_fig_year, 
                                     obl_fig_year,
                                     ncol = 1,
                                     align = "v",
                                     rel_heights = c(0.5,0.5))
plot_col

title1 <- cowplot::ggdraw() + 
                    draw_label(
                      "Requested vs obligated FEMA funds per year",
                      fontface = 'bold',
                      x = 0,
                      hjust = 0) +
                    theme( plot.margin = margin(0, 0, 0, 7))

funds_fig_year <- cowplot::plot_grid(
                              title1, plot_col,
                              ncol = 1,
                              # rel_heights values control vertical title margins
                              rel_heights = c(0.1, 1))
funds_fig_year

#save file
funds_year_fig_file = here("results","funds-years.png")
ggsave(filename = funds_year_fig_file, plot = funds_fig_year)
## Saving 7 x 5 in image
#create a table where columns represent requested and obligated funds and rows are 5 most disaster-prone states
decs_year_3 <- EDAdata %>%
                    dplyr::add_count(IncidentYear) %>%
                    dplyr::filter(dense_rank(-n) < 4) %>%
                    gtsummary::select(IncidentYear, ReqAmt, OblAmt) %>%
                    gtsummary::tbl_summary(
                      by = IncidentYear,
                      label = list(IncidentYear ~ "Year",
                                   ReqAmt ~ "Requested Funds",
                                   OblAmt ~ "Obligated Funds"),
                      statistic = list(all_continuous() ~ "{mean} ({sd})")) %>%
                    gtsummary::modify_header(label ~ "**Variable**") %>% 
                    gtsummary::modify_spanning_header(c("stat_1", "stat_2", "stat_3") ~ "**Year**") %>%
                    gtsummary::modify_caption("**Table 2. FEMA Funding for the 3 Costliest Years (2012 - 2021)**") %>%
                    gtsummary::bold_labels()
decs_year_3
Table 2. FEMA Funding for the 3 Costliest Years (2012 - 2021)
Variable Year
2012, N = 431 2017, N = 481 2020, N = 1151
Requested Funds 8,523,455 (49,075,292) 116,482,973 (638,527,353) 54,834,301 (104,214,883)
Obligated Funds 8,740,375 (49,166,434) 116,915,384 (641,357,763) 54,838,235 (104,217,430)

1 Mean (SD)

#save file
decs_year_file = here("results", "decs-year-3.Rds")
saveRDS(decs_year_3, file = decs_year_file)

Incident Month

#summary statistics since it's continuous
base::summary(EDAdata$IncidentMonth)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   2.000   6.000   5.617   9.000  12.000
#examine graphical relationship with outcomes

#density plot
ggplot2::ggplot(data = EDAdata, aes(x = IncidentMonth)) +
  geom_density()

#normal boxplot
ggplot2::ggplot(data = EDAdata, aes(y = IncidentMonth)) +
  geom_boxplot()

##QQ plot
car::qqPlot(EDAdata$IncidentMonth)

## [1] 16 23
#histogram distribution
month_hist <- EDAdata %>% 
               ggplot2::ggplot(aes(x=IncidentMonth)) + 
                        geom_bar(fill="darkseagreen4") + 
                        scale_x_continuous(breaks = seq(1, 12, by = 1),
                                           labels = c("Jan", "Feb", "Mar", "Apr", "May", "Jun",
                                                      "Jul", "Aug", "Sept", "Oct", "Nov", "Dec")) +
                        scale_y_continuous(expand = c(0, 0), limits = c(0, 110)) +
                        geom_text(stat='count', aes(label=..count..), vjust=-1) +
                        labs(x = "Declaration Month",
                             y = "Number of Declarations",
                             title = "Disaster Declarations per Month (2012 - 2021)")
month_hist

#save file
month_fig_file = here("results","incidentmonth.png")
ggsave(filename = month_fig_file, plot = month_hist)
## Saving 7 x 5 in image
#examine graphical relationship with outcomes of interest
#look at total cost per incident month
req_fig_month <- EDAdata %>%
                    ggplot2::ggplot(aes(x = IncidentMonth)) +
                      geom_bar(aes(y = ReqAmt),
                               stat = "identity",
                               fill = "tomato") +
                      scale_y_continuous(expand = c(0, 0), 
                                         labels = scales::dollar_format()) +
                      scale_x_continuous(breaks = seq(1, 12, by = 1),
                                           labels = c("Jan", "Feb", "Mar", "Apr", "May", "Jun",
                                                      "Jul", "Aug", "Sept", "Oct", "Nov", "Dec")) +
                      labs(x = "Declaration Month",
                           y = "Funds ($)",
                           subtitle = "Requested")
req_fig_month

#repeat for obligated funds
obl_fig_month <- EDAdata %>%
                    ggplot2::ggplot(aes(x = IncidentMonth)) +
                      geom_bar(aes(y = OblAmt),
                               stat = "identity",
                               fill = "turquoise4") +
                      scale_y_continuous(expand = c(0, 0),
                                         labels = scales::dollar_format()) +
                      scale_x_continuous(breaks = seq(1, 12, by = 1),
                                           labels = c("Jan", "Feb", "Mar", "Apr", "May", "Jun",
                                                      "Jul", "Aug", "Sept", "Oct", "Nov", "Dec")) +
                      labs(x = "Declaration Month",
                           y = "Funds ($)",
                           subtitle = "Obligated")
obl_fig_month

#combine into one plot
plot_col1 <- cowplot::plot_grid(req_fig_month, 
                                     obl_fig_month,
                                     ncol = 1,
                                     align = "v",
                                     rel_heights = c(0.5,0.5))
plot_col1

title2 <- cowplot::ggdraw() + 
                    draw_label(
                      "Requested vs obligated FEMA funds by declaration month",
                      fontface = 'bold',
                      x = 0,
                      hjust = 0) +
                    theme( plot.margin = margin(0, 0, 0, 7))

funds_fig_month <- cowplot::plot_grid(
                              title2, plot_col1,
                              ncol = 1,
                              # rel_heights values control vertical title margins
                              rel_heights = c(0.1, 1))
funds_fig_month

#save file
funds_month_fig_file = here("results","funds-month.png")
ggsave(filename = funds_month_fig_file, plot = funds_fig_month)
## Saving 7 x 5 in image
#create a table where columns represent requested and obligated funds and rows are 3 most distaster prone-months
decs_month_3 <- EDAdata %>%
                    dplyr::add_count(IncidentMonth) %>%
                    dplyr::filter(dense_rank(-n) < 4) %>%
                    gtsummary::select(IncidentMonth, ReqAmt, OblAmt) %>%
                    gtsummary::tbl_summary(
                      by = IncidentMonth,
                      label = list(IncidentMonth ~ "Month",
                                   ReqAmt ~ "Requested Funds",
                                   OblAmt ~ "Obligated Funds"),
                      statistic = list(all_continuous() ~ "{mean} ({sd})")) %>%
                    gtsummary::modify_header(label ~ "**Variable**") %>%
                    gtsummary::modify_header(update = list(
                                               stat_1 ~ "**Jan**",
                                               stat_2 ~ "**Sept**",
                                               stat_3 ~ "**Oct**"
                                             )) %>% 
                    gtsummary::modify_spanning_header(c("stat_1", "stat_2", "stat_3") ~ "**Month**") %>%
                    gtsummary::modify_caption("**Table 3. FEMA Funding for the 3 Costliest Months (2012 - 2021)**") %>%
                    gtsummary::bold_labels()
decs_month_3
Table 3. FEMA Funding for the 3 Costliest Months (2012 - 2021)
Variable Month
Jan1 Sept1 Oct1
Requested Funds 63,229,783 (111,136,115) 100,831,890 (639,779,995) 22,992,573 (101,460,411)
Obligated Funds 63,239,449 (111,135,744) 101,252,822 (642,677,256) 23,165,582 (101,470,263)

1 Mean (SD)

#save file
decs_month_file = here("results", "decs-month-3.Rds")
saveRDS(decs_month_3, file = decs_month_file)

Predictor: Declaration Type

#frequency table since it's categorical
#hide NAs because there are no NAs
summarytools::freq(EDAdata$declarationType, report.nas = FALSE)
## Frequencies  
## EDAdata$declarationType  
## Type: Character  
## 
##               Freq        %   % Cum.
## ----------- ------ -------- --------
##          DR    330    73.99    73.99
##          EM    116    26.01   100.00
##       Total    446   100.00   100.00
#examine graphical relationship with outcomes
#include a jitter function to have a better idea of number of measurements and distribution
#start with obligated amount
obl_fig_DT <- EDAdata %>%
                    ggplot2::ggplot(aes(x = declarationType, y = OblAmt)) +
                      geom_boxplot() +
                      geom_jitter(alpha = 0.5, width = 0.2, height = 0.2, color = "tomato") +
                      scale_y_continuous(labels = scales::dollar_format()) +
                      scale_x_discrete(labels = c("Major Disaster Declaration", "Emergency Declaration")) + 
                      labs(x = "Declaration Type",
                           y = "Obligated FEMA Funds",
                           title = "FEMA funds obligated for declaration types")
obl_fig_DT

#save file
obl_DT_fig_file = here("results","obl-DT.png")
ggsave(filename = obl_DT_fig_file, plot = obl_fig_DT)
## Saving 7 x 5 in image
#repeat for requested funds
req_fig_DT <- EDAdata %>%
                    ggplot2::ggplot(aes(x = declarationType, y = ReqAmt)) +
                      geom_boxplot() +
                      geom_jitter(alpha = 0.5, width = 0.2, height = 0.2, color = "turquoise4") +
                      scale_y_continuous(labels = scales::dollar_format()) +
                      scale_x_discrete(labels = c("Major Disaster Declaration", "Emergency Declaration")) + 
                      labs(x = "Declaration Type",
                           y = "Requested FEMA Funds",
                           title = "FEMA funds requested for declaration types")
req_fig_DT

#save file
req_DT_fig_file = here("results","req-DT.png")
ggsave(filename = req_DT_fig_file, plot = req_fig_DT)
## Saving 7 x 5 in image
#create a table where rows represent requested and obligated funds and columns are declaration type
decs_dtype <- EDAdata %>%
                    gtsummary::select(declarationType, ReqAmt, OblAmt) %>%
                    gtsummary::tbl_summary(
                      by = declarationType,
                      label = list(declarationType ~ "Type",
                                   ReqAmt ~ "Requested Funds",
                                   OblAmt ~ "Obligated Funds"),
                      statistic = list(all_continuous() ~ "{mean} ({sd})")) %>%
                    gtsummary::modify_header(label ~ "**Variable**") %>% 
                    gtsummary::modify_header(update = list(
                                               stat_1 ~ "**Major Disaster**",
                                               stat_2 ~ "**Emergency**")) %>% 
                    gtsummary::modify_spanning_header(c("stat_1", "stat_2") ~ "**Type**") %>%
                    gtsummary::modify_caption("**Table 4. FEMA Funding for Declaration Types 2012 - 2021**") %>%
                    gtsummary::bold_labels()
decs_dtype
Table 4. FEMA Funding for Declaration Types 2012 - 2021
Variable Type
Major Disaster1 Emergency1
Requested Funds 40,010,548 (253,637,660) 267,101 (625,898)
Obligated Funds 40,105,724 (254,673,610) 267,101 (625,898)

1 Mean (SD)

#save file
decs_dtype_file = here("results", "decs-DT.Rds")
saveRDS(decs_dtype, file = decs_dtype_file)

Predictor: Incident Type

#frequency table since it's categorical
#hide NAs because there are no NAs
summarytools::freq(EDAdata$incidentType, report.nas = FALSE)
## Frequencies  
## EDAdata$incidentType  
## Type: Character  
## 
##                          Freq        %   % Cum.
## ---------------------- ------ -------- --------
##             Biological     78    17.49    17.49
##               Chemical      1     0.22    17.71
##          Coastal Storm      4     0.90    18.61
##        Dam/Levee Break      2     0.45    19.06
##             Earthquake      5     1.12    20.18
##                   Fire     22     4.93    25.11
##                  Flood     75    16.82    41.93
##              Hurricane    105    23.54    65.47
##          Mud/Landslide      3     0.67    66.14
##                  Other      4     0.90    67.04
##       Severe Ice Storm     12     2.69    69.73
##        Severe Storm(s)    117    26.23    95.96
##                   Snow      5     1.12    97.09
##              Terrorist      1     0.22    97.31
##                Tornado     11     2.47    99.78
##                Volcano      1     0.22   100.00
##                  Total    446   100.00   100.00
#examine graphical relationship with outcomes
#include a jitter function to have a better idea of number of measurements and distribution
#start with obligated amount
obl_fig_IT <- EDAdata %>%
                    ggplot2::ggplot(aes(x = incidentType, y = OblAmt)) +
                      geom_boxplot() +
                      geom_jitter(alpha = 0.5, width = 0.2, height = 0.2, color = "tomato") +
                      scale_y_continuous(labels = scales::dollar_format()) +
                      labs(x = "Incident Type",
                           y = "Obligated FEMA Funds",
                           title = "FEMA funds obligated for incident types") +
                      theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))
obl_fig_IT

#save file
obl_IT_fig_file = here("results","obl-IT.png")
ggsave(filename = obl_IT_fig_file, plot = obl_fig_IT)
## Saving 7 x 5 in image
#repeat for requested funds
req_fig_IT <- EDAdata %>%
                    ggplot2::ggplot(aes(x = incidentType, y = ReqAmt)) +
                      geom_boxplot() +
                      geom_jitter(alpha = 0.5, width = 0.2, height = 0.2, color = "turquoise4") +
                      scale_y_continuous(labels = scales::dollar_format()) + 
                      labs(x = "Incident Type",
                           y = "Requested FEMA Funds",
                           title = "FEMA funds requested for declaration types") +
                      theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))
req_fig_IT

#save file
req_IT_fig_file = here("results","req-IT.png")
ggsave(filename = req_IT_fig_file, plot = req_fig_IT)
## Saving 7 x 5 in image
#create a table where rows represent requested and obligated funds and columns are declaration type
decs_itype3 <- EDAdata %>%
                    dplyr::add_count(incidentType) %>%
                    dplyr::filter(dense_rank(-n) < 4) %>%
                    gtsummary::select(incidentType, ReqAmt, OblAmt) %>%
                    gtsummary::tbl_summary(
                      by = incidentType,
                      label = list(incidentType ~ "Type",
                                   ReqAmt ~ "Requested Funds",
                                   OblAmt ~ "Obligated Funds"),
                      statistic = list(all_continuous() ~ "{mean} ({sd})")) %>%
                    gtsummary::modify_header(label ~ "**Variable**") %>% 
                    gtsummary::modify_spanning_header(c("stat_1", "stat_2", "stat_3") ~ "**Type**") %>%
                    gtsummary::modify_caption("**Table 4. FEMA Funding for Declaration Types 2012 - 2021**") %>%
                    gtsummary::bold_labels()
decs_itype3
Table 4. FEMA Funding for Declaration Types 2012 - 2021
Variable Type
Biological, N = 781 Hurricane, N = 1051 Severe Storm(s), N = 1171
Requested Funds 78,601,344 (119,067,029) 57,089,787 (430,915,934) 289,380 (902,722)
Obligated Funds 78,607,144 (119,069,157) 57,370,720 (432,833,622) 289,950 (903,592)

1 Mean (SD)

#save file
decs_itype3_file = here("results", "decs-IT-3.Rds")
saveRDS(decs_itype3, file = decs_itype3_file)

Predictor: Incident Duration

#convert class to numeric
EDAdata$IncidentDuration <- as.numeric(EDAdata$IncidentDuration, units="days")

#summary statistics since it's continuous
base::summary(EDAdata$IncidentDuration)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00    4.00   12.00  128.25   38.72  660.00
#NAs are ongoing events

#density plot
ggplot2::ggplot(data = EDAdata, aes(x = IncidentDuration)) +
  geom_density()

#normal boxplot
ggplot2::ggplot(data = EDAdata, aes(y = IncidentDuration)) +
  geom_boxplot()

##QQ plot
car::qqPlot(EDAdata$IncidentDuration)

## [1] 3 5
#examine graphical relationship with outcomes of interest
#look at total cost per incident month
req_fig_dur <- EDAdata %>%
                    ggplot2::ggplot(aes(x = IncidentDuration)) +
                      geom_point(aes(y = ReqAmt),
                                 color = "tomato",
                                 show.legend = FALSE) +
                      scale_y_continuous(expand = c(0, 0),
                                         limits = c(0, 1000000000),
                                         labels = scales::dollar_format()) +
                      labs(x = "Incident Duration (Days)",
                           y = "Funds ($)",
                           subtitle = "Requested")
req_fig_dur
## Warning: Removed 6 rows containing missing values (geom_point).

#repeat for obligated funds
obl_fig_dur <- EDAdata %>%
                    ggplot2::ggplot(aes(x = IncidentDuration)) +
                      geom_point(aes(y = OblAmt),
                                 color = "turquoise4",
                                 show.legend = FALSE) +
                      scale_y_continuous(expand = c(0, 0),
                                         limits = c(0, 1000000000),
                                         labels = scales::dollar_format()) +
                      labs(x = "Incident Duration (Days)",
                           y = "Funds ($)",
                           subtitle = "Obligated")
obl_fig_dur
## Warning: Removed 6 rows containing missing values (geom_point).

#combine into one plot
plot_col2 <- cowplot::plot_grid(req_fig_dur, 
                                     obl_fig_dur,
                                     ncol = 1,
                                     align = "v",
                                     rel_heights = c(0.5,0.5))
## Warning: Removed 6 rows containing missing values (geom_point).

## Warning: Removed 6 rows containing missing values (geom_point).
plot_col2

title3 <- cowplot::ggdraw() + 
                    draw_label(
                      "Requested vs obligated FEMA funds by Incident Duration (Under $1B)",
                      fontface = 'bold',
                      x = 0,
                      hjust = 0) +
                    theme( plot.margin = margin(0, 0, 0, 7))

funds_fig_dur <- cowplot::plot_grid(
                              title3, plot_col2,
                              ncol = 1,
                              # rel_heights values control vertical title margins
                              rel_heights = c(0.1, 1))
funds_fig_dur

#save file
funds_dur_fig_file = here("results","funds-dur.png")
ggsave(filename = funds_dur_fig_file, plot = funds_fig_dur)
## Saving 7 x 5 in image

Predictor: Population

#convert class to numeric
EDAdata$Population <- as.numeric(gsub(",", "", EDAdata$Population))

#summary statistics since it's continuous
base::summary(EDAdata$Population)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##   576851  2961279  5024279  8644007 10439388 39538223
#density plot
ggplot2::ggplot(data = EDAdata, aes(x = Population)) +
  geom_density()

#normal boxplot
ggplot2::ggplot(data = EDAdata, aes(y = Population)) +
  geom_boxplot()

##QQ plot
car::qqPlot(EDAdata$Population)

## [1] 32 33
#examine graphical relationship with outcomes of interest
#look at total cost per pop
req_fig_pop <- EDAdata %>%
                    ggplot2::ggplot(aes(x = Population)) +
                      geom_point(aes(y = ReqAmt),
                                 color = "tomato",
                                 show.legend = FALSE) +
                      scale_y_continuous(expand = c(0, 0),
                                         limits = c(0, 1000000000),
                                         labels = scales::dollar_format()) +
                      scale_x_continuous(label = comma,
                                         limits = c(0, 40000000)) +
                      labs(x = "State Population",
                           y = "Funds ($)",
                           subtitle = "Requested")
req_fig_pop
## Warning: Removed 6 rows containing missing values (geom_point).

#repeat for obligated funds
obl_fig_pop <- EDAdata %>%
                    ggplot2::ggplot(aes(x = Population)) +
                      geom_point(aes(y = OblAmt),
                                 color = "turquoise4",
                                 show.legend = FALSE) +
                      scale_y_continuous(expand = c(0, 0),
                                         limits = c(0, 1000000000),
                                         labels = scales::dollar_format()) +
                      scale_x_continuous(label = comma,
                                         limits = c(0, 40000000)) +
                      labs(x = "State Population",
                           y = "Funds ($)",
                           subtitle = "Obligated")
obl_fig_pop
## Warning: Removed 6 rows containing missing values (geom_point).

#combine into one plot
plot_col3 <- cowplot::plot_grid(req_fig_pop, 
                                     obl_fig_pop,
                                     ncol = 1,
                                     align = "v",
                                     rel_heights = c(0.5,0.5))
## Warning: Removed 6 rows containing missing values (geom_point).

## Warning: Removed 6 rows containing missing values (geom_point).
plot_col3

title4 <- cowplot::ggdraw() + 
                    draw_label(
                      "Requested vs obligated FEMA funds by State Population (Under $1B)",
                      fontface = 'bold',
                      x = 0,
                      hjust = 0) +
                    theme( plot.margin = margin(0, 0, 0, 7))

funds_fig_pop <- cowplot::plot_grid(
                              title4, plot_col3,
                              ncol = 1,
                              # rel_heights values control vertical title margins
                              rel_heights = c(0.1, 1))
funds_fig_pop

#save file
funds_pop_fig_file = here("results","funds-pop.png")
ggsave(filename = funds_pop_fig_file, plot = funds_fig_pop)
## Saving 7 x 5 in image

Predictor: Total Number of Agencies Involved

#summary statistics since it's continuous
base::summary(EDAdata$TotalAgencies)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   1.000   2.000   4.908   7.000  55.000
#density plot
ggplot2::ggplot(data = EDAdata, aes(x = TotalAgencies)) +
  geom_density()

#normal boxplot
ggplot2::ggplot(data = EDAdata, aes(y = TotalAgencies)) +
  geom_boxplot()

##QQ plot
car::qqPlot(EDAdata$TotalAgencies)

## [1] 328 384
#examine graphical relationship with outcomes of interest
#look at total cost per incident month
req_fig_ag <- EDAdata %>%
                    ggplot2::ggplot(aes(x = TotalAgencies)) +
                      geom_point(aes(y = ReqAmt),
                                 color = "tomato",
                                 show.legend = FALSE) +
                      scale_y_continuous(expand = c(0, 0),
                                         limits = c(0, 1000000000),
                                         labels = scales::dollar_format()) +
                      labs(x = "Number of Federal Agencies",
                           y = "Funds ($)",
                           subtitle = "Requested")
req_fig_ag
## Warning: Removed 6 rows containing missing values (geom_point).

#repeat for obligated funds
obl_fig_ag <- EDAdata %>%
                    ggplot2::ggplot(aes(x = TotalAgencies)) +
                      geom_point(aes(y = OblAmt),
                                 color = "turquoise4",
                                 show.legend = FALSE) +
                      scale_y_continuous(expand = c(0, 0),
                                         limits = c(0, 1000000000),
                                         labels = scales::dollar_format()) +
                      labs(x = "Number of Federal Agencies",
                           y = "Funds ($)",
                           subtitle = "Obligated")
obl_fig_ag
## Warning: Removed 6 rows containing missing values (geom_point).

#combine into one plot
plot_col4 <- cowplot::plot_grid(req_fig_ag, 
                                     obl_fig_ag,
                                     ncol = 1,
                                     align = "v",
                                     rel_heights = c(0.5,0.5))
## Warning: Removed 6 rows containing missing values (geom_point).

## Warning: Removed 6 rows containing missing values (geom_point).
plot_col4

title5 <- cowplot::ggdraw() + 
                    draw_label(
                      "Requested vs obligated FEMA funds by number of federal agencies (Under $1B)",
                      fontface = 'bold',
                      x = 0,
                      hjust = 0) +
                    theme( plot.margin = margin(0, 0, 0, 7))

funds_fig_ag <- cowplot::plot_grid(
                              title5, plot_col4,
                              ncol = 1,
                              # rel_heights values control vertical title margins
                              rel_heights = c(0.1, 1))
funds_fig_ag

#save file
funds_ag_fig_file = here("results","funds-ag.png")
ggsave(filename = funds_ag_fig_file, plot = funds_fig_ag)
## Saving 7 x 5 in image

Predictor: Number of Counties

#summary statistics since it's continuous
base::summary(EDAdata$Counties)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00   46.00   67.00   74.78   92.75  254.00
#density plot
ggplot2::ggplot(data = EDAdata, aes(x = Counties)) +
  geom_density()

#normal boxplot
ggplot2::ggplot(data = EDAdata, aes(y = Counties)) +
  geom_boxplot()

##QQ plot
car::qqPlot(EDAdata$Counties)

## [1] 372 373
#examine graphical relationship with outcomes of interest
#look at total cost per incident month
req_fig_count <- EDAdata %>%
                    ggplot2::ggplot(aes(x = Counties)) +
                      geom_point(aes(y = ReqAmt),
                                 color = "tomato",
                                 show.legend = FALSE) +
                      scale_y_continuous(expand = c(0, 0),
                                         limits = c(0, 1000000000),
                                         labels = scales::dollar_format()) +
                      labs(x = "Number of Counties",
                           y = "Funds ($)",
                           subtitle = "Requested")
req_fig_count
## Warning: Removed 6 rows containing missing values (geom_point).

#repeat for obligated funds
obl_fig_count <- EDAdata %>%
                    ggplot2::ggplot(aes(x = Counties)) +
                      geom_point(aes(y = OblAmt),
                                 color = "turquoise4",
                                 show.legend = FALSE) +
                      scale_y_continuous(expand = c(0, 0),
                                         limits = c(0, 1000000000),
                                         labels = scales::dollar_format()) +
                      labs(x = "Number of Counties",
                           y = "Funds ($)",
                           subtitle = "Obligated")
obl_fig_count
## Warning: Removed 6 rows containing missing values (geom_point).

#combine into one plot
plot_col5 <- cowplot::plot_grid(req_fig_count, 
                                     obl_fig_count,
                                     ncol = 1,
                                     align = "v",
                                     rel_heights = c(0.5,0.5))
## Warning: Removed 6 rows containing missing values (geom_point).

## Warning: Removed 6 rows containing missing values (geom_point).
plot_col5

title6 <- cowplot::ggdraw() + 
                    draw_label(
                      "Requested vs obligated FEMA funds by number of counties (Under $1B)",
                      fontface = 'bold',
                      x = 0,
                      hjust = 0) +
                    theme( plot.margin = margin(0, 0, 0, 7))

funds_fig_count <- cowplot::plot_grid(
                              title6, plot_col5,
                              ncol = 1,
                              # rel_heights values control vertical title margins
                              rel_heights = c(0.1, 1))
funds_fig_count

#save file
funds_count_fig_file = here("results","funds-count.png")
ggsave(filename = funds_count_fig_file, plot = funds_fig_count)
## Saving 7 x 5 in image

Predictor: FEMA Cost Share Average

#summary statistics since it's continuous
base::summary(EDAdata$FEMACSAvg)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.8750  0.9780  0.8062  1.0000  1.0000
#density plot
ggplot2::ggplot(data = EDAdata, aes(x = FEMACSAvg)) +
  geom_density()

#normal boxplot
ggplot2::ggplot(data = EDAdata, aes(y = FEMACSAvg)) +
  geom_boxplot()

##QQ plot
car::qqPlot(EDAdata$FEMACSAvg)

## [1] 1 2
#examine graphical relationship with outcomes of interest
#look at total cost per incident month
req_fig_cs <- EDAdata %>%
                    ggplot2::ggplot(aes(x = FEMACSAvg)) +
                      geom_point(aes(y = ReqAmt),
                                 color = "tomato",
                                 show.legend = FALSE) +
                      scale_y_continuous(expand = c(0, 0),
                                         limits = c(0, 1000000000),
                                         labels = scales::dollar_format()) +
                      labs(x = "Average FEMA Cost Share",
                           y = "Funds ($)",
                           subtitle = "Requested")
req_fig_cs
## Warning: Removed 6 rows containing missing values (geom_point).

#repeat for obligated funds
obl_fig_cs <- EDAdata %>%
                    ggplot2::ggplot(aes(x = FEMACSAvg)) +
                      geom_point(aes(y = OblAmt),
                                 color = "turquoise4",
                                 show.legend = FALSE) +
                      scale_y_continuous(expand = c(0, 0),
                                         limits = c(0, 1000000000),
                                         labels = scales::dollar_format()) +
                      labs(x = "Average FEMA Cost Share",
                           y = "Funds ($)",
                           subtitle = "Obligated")
obl_fig_cs
## Warning: Removed 6 rows containing missing values (geom_point).

#combine into one plot
plot_col6 <- cowplot::plot_grid(req_fig_cs, 
                                     obl_fig_cs,
                                     ncol = 1,
                                     align = "v",
                                     rel_heights = c(0.5,0.5))
## Warning: Removed 6 rows containing missing values (geom_point).

## Warning: Removed 6 rows containing missing values (geom_point).
plot_col6

title7 <- cowplot::ggdraw() + 
                    draw_label(
                      "Requested vs obligated FEMA funds by avg FEMA cost share (Under $1B)",
                      fontface = 'bold',
                      x = 0,
                      hjust = 0) +
                    theme( plot.margin = margin(0, 0, 0, 7))

funds_fig_cs <- cowplot::plot_grid(
                              title7, plot_col6,
                              ncol = 1,
                              # rel_heights values control vertical title margins
                              rel_heights = c(0.1, 1))
funds_fig_cs

#save file
funds_cs_fig_file = here("results","funds-cs.png")
ggsave(filename = funds_cs_fig_file, plot = funds_fig_cs)
## Saving 7 x 5 in image

Predictor: IH Program

#convert to factor
EDAdata$IH <- as.factor(EDAdata$IH)

#frequency table since it's categorical
#hide NAs because there are no NAs
summarytools::freq(EDAdata$IH, report.nas = FALSE)
## Frequencies  
## EDAdata$IH  
## Type: Factor  
## 
##               Freq        %   % Cum.
## ----------- ------ -------- --------
##           0    267    59.87    59.87
##           1    179    40.13   100.00
##       Total    446   100.00   100.00
#examine graphical relationship with outcomes
#include a jitter function to have a better idea of number of measurements and distribution
#start with obligated amount
obl_fig_IH <- EDAdata %>%
                    ggplot2::ggplot(aes(x = IH, y = OblAmt)) +
                      geom_boxplot() +
                      geom_jitter(alpha = 0.5, width = 0.2, height = 0.2, color = "tomato") +
                      scale_y_continuous(labels = scales::dollar_format()) +
                      scale_x_discrete(labels = c("Not Awarded", "Awarded")) +
                      labs(x = "IH Program",
                           y = "Obligated FEMA Funds",
                           title = "FEMA funds obligated when IH program is awarded") +
                      theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))
obl_fig_IH

#save file
obl_IH_fig_file = here("results","obl-IH.png")
ggsave(filename = obl_IH_fig_file, plot = obl_fig_IH)
## Saving 7 x 5 in image
#repeat for requested funds
req_fig_IH <- EDAdata %>%
                    ggplot2::ggplot(aes(x = IH, y = ReqAmt)) +
                      geom_boxplot() +
                      geom_jitter(alpha = 0.5, width = 0.2, height = 0.2, color = "turquoise4") +
                      scale_y_continuous(labels = scales::dollar_format()) + 
                      scale_x_discrete(labels = c("Not Awarded", "Awarded")) +
                      labs(x = "IH Program",
                           y = "Requested FEMA Funds",
                           title = "FEMA funds requested when IH program is awarded") +
                      theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))
req_fig_IH

#save file
req_IH_fig_file = here("results","req-IH.png")
ggsave(filename = req_IH_fig_file, plot = req_fig_IH)
## Saving 7 x 5 in image
#create a table where rows represent requested and obligated funds and columns are declaration type
decs_ih <- EDAdata %>%
                    gtsummary::select(IH, ReqAmt, OblAmt) %>%
                    gtsummary::tbl_summary(
                      by = IH,
                      label = list(IH ~ "Individuals / Households Program",
                                   ReqAmt ~ "Requested Funds",
                                   OblAmt ~ "Obligated Funds"),
                      statistic = list(all_continuous() ~ "{mean} ({sd})")) %>%
                    gtsummary::modify_header(label ~ "**Variable**") %>% 
                    gtsummary::modify_header(update = list(
                                               stat_1 ~ "**Not Awarded**",
                                               stat_2 ~ "**Awarded**")) %>% 
                    gtsummary::modify_spanning_header(c("stat_1", "stat_2") ~ "**Individuals/Households Program**") %>%
                    gtsummary::modify_caption("**Table 5. FEMA Funding by IH Program Awardance 2012 - 2021**") %>%
                    gtsummary::bold_labels()
decs_ih
Table 5. FEMA Funding by IH Program Awardance 2012 - 2021
Variable Individuals/Households Program
Not Awarded1 Awarded1
Requested Funds 293,100 (784,973) 73,498,363 (341,233,996)
Obligated Funds 293,100 (784,973) 73,673,828 (342,639,903)

1 Mean (SD)

#save file
decs_ih_file = here("results", "decs-IH.Rds")
saveRDS(decs_ih, file = decs_ih_file)

Predictor: PA Program

#convert to factor
EDAdata$PA <- as.factor(EDAdata$PA)

#frequency table since it's categorical
#hide NAs because there are no NAs
summarytools::freq(EDAdata$PA, report.nas = FALSE)
## Frequencies  
## EDAdata$PA  
## Type: Factor  
## 
##               Freq        %   % Cum.
## ----------- ------ -------- --------
##           0     12     2.69     2.69
##           1    434    97.31   100.00
##       Total    446   100.00   100.00
#examine graphical relationship with outcomes
#include a jitter function to have a better idea of number of measurements and distribution
#start with obligated amount
obl_fig_PA <- EDAdata %>%
                    ggplot2::ggplot(aes(x = PA, y = OblAmt)) +
                      geom_boxplot() +
                      geom_jitter(alpha = 0.5, width = 0.2, height = 0.2, color = "tomato") +
                      scale_y_continuous(labels = scales::dollar_format()) +
                      scale_x_discrete(labels = c("Not Awarded", "Awarded")) +
                      labs(x = "PA Program",
                           y = "Obligated FEMA Funds",
                           title = "FEMA funds obligated when PA program is awarded") +
                      theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))
obl_fig_PA

#save file
obl_PA_fig_file = here("results","obl-PA.png")
ggsave(filename = obl_PA_fig_file, plot = obl_fig_PA)
## Saving 7 x 5 in image
#repeat for requested funds
req_fig_PA <- EDAdata %>%
                    ggplot2::ggplot(aes(x = PA, y = ReqAmt)) +
                      geom_boxplot() +
                      geom_jitter(alpha = 0.5, width = 0.2, height = 0.2, color = "turquoise4") +
                      scale_y_continuous(labels = scales::dollar_format()) + 
                      scale_x_discrete(labels = c("Not Awarded", "Awarded")) +
                      labs(x = "PA Program",
                           y = "Requested FEMA Funds",
                           title = "FEMA funds requested when PA program is awarded") +
                      theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))
req_fig_PA

#save file
req_PA_fig_file = here("results","req-PA.png")
ggsave(filename = req_PA_fig_file, plot = req_fig_PA)
## Saving 7 x 5 in image
#create a table where rows represent requested and obligated funds and columns are declaration type
decs_pa <- EDAdata %>%
                    gtsummary::select(PA, ReqAmt, OblAmt) %>%
                    gtsummary::tbl_summary(
                      by = PA,
                      label = list(PA ~ "Public Assistance Program",
                                   ReqAmt ~ "Requested Funds",
                                   OblAmt ~ "Obligated Funds"),
                      statistic = list(all_continuous() ~ "{mean} ({sd})")) %>%
                    gtsummary::modify_header(label ~ "**Variable**") %>% 
                    gtsummary::modify_header(update = list(
                                               stat_1 ~ "**Not Awarded**",
                                               stat_2 ~ "**Awarded**")) %>% 
                    gtsummary::modify_spanning_header(c("stat_1", "stat_2") ~ "**Public Assistance Program**") %>%
                    gtsummary::modify_caption("**Table 6. FEMA Funding by PA Program Awardance 2012 - 2021**") %>%
                    gtsummary::bold_labels()
decs_pa
Table 6. FEMA Funding by PA Program Awardance 2012 - 2021
Variable Public Assistance Program
Not Awarded1 Awarded1
Requested Funds 162,348 (355,421) 30,489,669 (221,740,562)
Obligated Funds 162,348 (355,421) 30,562,038 (222,644,037)

1 Mean (SD)

#save file
decs_pa_file = here("results", "decs-PA.Rds")
saveRDS(decs_pa, file = decs_pa_file)

Predictor: HM Program

#convert to factor
EDAdata$HM <- as.factor(EDAdata$HM)

#frequency table since it's categorical
#hide NAs because there are no NAs
summarytools::freq(EDAdata$HM, report.nas = FALSE)
## Frequencies  
## EDAdata$HM  
## Type: Factor  
## 
##               Freq        %   % Cum.
## ----------- ------ -------- --------
##           0    117    26.23    26.23
##           1    329    73.77   100.00
##       Total    446   100.00   100.00
#examine graphical relationship with outcomes
#include a jitter function to have a better idea of number of measurements and distribution
#start with obligated amount
obl_fig_HM <- EDAdata %>%
                    ggplot2::ggplot(aes(x = HM, y = OblAmt)) +
                      geom_boxplot() +
                      geom_jitter(alpha = 0.5, width = 0.2, height = 0.2, color = "tomato") +
                      scale_y_continuous(labels = scales::dollar_format()) +
                      scale_x_discrete(labels = c("Not Awarded", "Awarded")) +
                      labs(x = "HM Program",
                           y = "Obligated FEMA Funds",
                           title = "FEMA funds obligated when HM program is awarded") +
                      theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))
obl_fig_HM

#save file
obl_HM_fig_file = here("results","obl-HM.png")
ggsave(filename = obl_HM_fig_file, plot = obl_fig_HM)
## Saving 7 x 5 in image
#repeat for requested funds
req_fig_HM <- EDAdata %>%
                    ggplot2::ggplot(aes(x = HM, y = ReqAmt)) +
                      geom_boxplot() +
                      geom_jitter(alpha = 0.5, width = 0.2, height = 0.2, color = "turquoise4") +
                      scale_y_continuous(labels = scales::dollar_format()) + 
                      scale_x_discrete(labels = c("Not Awarded", "Awarded")) +
                      labs(x = "HM Program",
                           y = "Requested FEMA Funds",
                           title = "FEMA funds requested when HM program is awarded") +
                      theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))
req_fig_HM

#save file
req_HM_fig_file = here("results","req-HM.png")
ggsave(filename = req_HM_fig_file, plot = req_fig_HM)
## Saving 7 x 5 in image
#create a table where rows represent requested and obligated funds and columns are declaration type
decs_hm <- EDAdata %>%
                    gtsummary::select(HM, ReqAmt, OblAmt) %>%
                    gtsummary::tbl_summary(
                      by = HM,
                      label = list(HM ~ "Hazard Mitigation Program",
                                   ReqAmt ~ "Requested Funds",
                                   OblAmt ~ "Obligated Funds"),
                      statistic = list(all_continuous() ~ "{mean} ({sd})")) %>%
                    gtsummary::modify_header(label ~ "**Variable**") %>% 
                    gtsummary::modify_header(update = list(
                                               stat_1 ~ "**Not Awarded**",
                                               stat_2 ~ "**Awarded**")) %>% 
                    gtsummary::modify_spanning_header(c("stat_1", "stat_2") ~ "**Hazard Mitigation Program**") %>%
                    gtsummary::modify_caption("**Table 7. FEMA Funding by HM Program Awardance 2012 - 2021**") %>%
                    gtsummary::bold_labels()
decs_hm
Table 7. FEMA Funding by HM Program Awardance 2012 - 2021
Variable Hazard Mitigation Program
Not Awarded1 Awarded1
Requested Funds 264,949 (623,629) 40,132,114 (254,014,380)
Obligated Funds 264,949 (623,629) 40,227,580 (255,051,902)

1 Mean (SD)

#save file
decs_hm_file = here("results", "decs-HM.Rds")
saveRDS(decs_hm, file = decs_hm_file)

Predictor: Response Duration

#summary statistics since it's continuous
base::summary(EDAdata$ResponseDays)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   70.04  660.00 1098.00 1332.20 2018.23 3588.00
#NAs are ongoing events

#density plot
ggplot2::ggplot(data = EDAdata, aes(x = ResponseDays)) +
  geom_density()

#normal boxplot
ggplot2::ggplot(data = EDAdata, aes(y = ResponseDays)) +
  geom_boxplot()

##QQ plot
car::qqPlot(EDAdata$ResponseDays)

## [1] 413 145
#examine graphical relationship with outcomes of interest
req_fig_resp <- EDAdata %>%
                    ggplot2::ggplot(aes(x = ResponseDays)) +
                      geom_point(aes(y = ReqAmt),
                                 color = "tomato",
                                 show.legend = FALSE) +
                      scale_y_continuous(expand = c(0, 0),
                                         limits = c(0, 1000000000),
                                         labels = scales::dollar_format()) +
                      labs(x = "Response Duration (Days)",
                           y = "Funds ($)",
                           subtitle = "Requested")
req_fig_resp
## Warning: Removed 6 rows containing missing values (geom_point).

#repeat for obligated funds
obl_fig_resp <- EDAdata %>%
                    ggplot2::ggplot(aes(x = ResponseDays)) +
                      geom_point(aes(y = OblAmt),
                                 color = "turquoise4",
                                 show.legend = FALSE) +
                      scale_y_continuous(expand = c(0, 0),
                                         limits = c(0, 1000000000),
                                         labels = scales::dollar_format()) +
                      labs(x = "Response Duration (Days)",
                           y = "Funds ($)",
                           subtitle = "Obligated")
obl_fig_resp
## Warning: Removed 6 rows containing missing values (geom_point).

#combine into one plot
plot_col7 <- cowplot::plot_grid(req_fig_resp, 
                                     obl_fig_resp,
                                     ncol = 1,
                                     align = "v",
                                     rel_heights = c(0.5,0.5))
## Warning: Removed 6 rows containing missing values (geom_point).

## Warning: Removed 6 rows containing missing values (geom_point).
plot_col7

title8 <- cowplot::ggdraw() + 
                    draw_label(
                      "Requested vs obligated FEMA funds by response duration (Under $1B)",
                      fontface = 'bold',
                      x = 0,
                      hjust = 0) +
                    theme( plot.margin = margin(0, 0, 0, 7))

funds_fig_resp <- cowplot::plot_grid(
                              title8, plot_col7,
                              ncol = 1,
                              # rel_heights values control vertical title margins
                              rel_heights = c(0.1, 1))
funds_fig_resp

#save file
funds_resp_fig_file = here("results","funds-resp.png")
ggsave(filename = funds_resp_fig_file, plot = funds_fig_resp)
## Saving 7 x 5 in image

Create “Table 1”

Make the summary table for the manuscript. Variables to include:

#label factors
EDAdata$IH <- factor(EDAdata$IH, 
                     levels = c(0, 1),
                     labels = c("Not Awarded", "Awarded"))
EDAdata$PA <- factor(EDAdata$PA, 
                     levels = c(0, 1),
                     labels = c("Not Awarded", "Awarded"))
EDAdata$HM <- factor(EDAdata$HM, 
                     levels = c(0, 1),
                     labels = c("Not Awarded", "Awarded"))

#specify table labels
table1::label(EDAdata$state) <- "State"
table1::label(EDAdata$IncidentYear) <- "Incident Year"
table1::label(EDAdata$IncidentMonth) <- "Declaration Month"
table1::label(EDAdata$declarationType) <- "Declaration Type"
table1::label(EDAdata$incidentType) <- "Incident Type"
table1::label(EDAdata$IncidentDuration) <- "Incident Duration"
table1::label(EDAdata$ResponseDays) <- "Response Duration"
table1::label(EDAdata$Population) <- "State Population"
table1::label(EDAdata$Counties) <- "Counties per State"
table1::label(EDAdata$TotalAgencies) <- "Federal Agencies Involved"
table1::label(EDAdata$FEMACSAvg) <- "Average FEMA Cost Share"
table1::label(EDAdata$IH) <- "Individuals/Households Program"
table1::label(EDAdata$PA) <- "Public Assistance Program"
table1::label(EDAdata$HM) <- "Hazard Mitigation Program"
table1::label(EDAdata$ReqAmt) <- "Requested FEMA Funds"
table1::label(EDAdata$OblAmt) <- "Obligated FEMA Funds"

#specify units
table1::units(EDAdata$IncidentDuration) <- "days"
table1::units(EDAdata$ResponseDays) <- "days"

#create table
table1 <- table1::table1(~ ReqAmt + OblAmt + state + IncidentYear + IncidentMonth + declarationType + incidentType +
                           IncidentDuration + ResponseDays + IH + PA + HM + Population + Counties + TotalAgencies + FEMACSAvg,
                          data = EDAdata)

#save file
table1_file = here("results", "table1.Rds")
saveRDS(table1, file = table1_file)

Save Data

Since we’ve made some changes to the format of the processed data, let’s save it for the analysis.

#location to save processed file
analysis_data_location <- here::here("data","processed_data","analysisdata.rds")

#save as an RDS
saveRDS(EDAdata, file = analysis_data_location)